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Abstract. Galled trees, directed acyclic graphs that model evolutionary histories 
with isolated hybridization events, have become very popular due to both their 
biological significance and the existence of polynomial time algorithms for their 
reconstruction. In this paper we establish to which extent several distance measures 
for the comparison of evolutionary networks are metrics for galled trees, and hence 
rjQ ■ when they can be safely used to evaluate galled tree reconstruction methods. 
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1 Introduction 

The study of phylogenetic networks as a model of reticulate evolution began with 
the representation of conflicting phylogenetic signals as an implicit splits nework [2, 
22], but it was soon realized that internal nodes in a splits network did not have 
\q • any direct interpretation in evolutionary terms. Attention turned then to the study 

. of explicit evolutionary networks, in which the internal nodes have a direct inter- 

pretation as reticulate evolutionary events such as recombination, hybridization, or 
lateral gene transfer. Unfortunately, the hardness of reconstructing an evolutionary 
network with as few recombination events as possible for a set of sequences, under 
the assumption of no repeated or back mutations, was soon established [6, 36, 37]. 

However, when the conflicting phylogenetic signals show a particular structure, 
such that the conflict graph of the set of sequences is biconvex, the evolutionary 
network with the smallest possible number of recombination events is unique, it 
can be reconstructed in polynomial time and it is a galled tree, an evolutionary 
network with hybrid nodes of in-degree 2 (because they correspond to explicit re- 
combinations) and disjoint reticulation cycles [19]. Galled trees are also relevant 
from a biological point of view because, as Gusfield et aJ point out in loc. cit., 
reticulation events tend to be isolated, yielding to disjoint reticulation cycles, if 
the level of recombination is moderate, or if most of the observable recombinations 
are recent. Actually, several slightly different notions of galled tree have been in- 
troduced so far in the literature, depending on the degree of disjointness of their 
reticulation cycles. The original galled trees [19] have node-disjoint reticulation 
cycles, while the nested networks with nesting level 1 [25, 27] (dubbed, for simplic- 
ity, 1-nested networks in this paper) have arc-disjoint reticulation cycles. Between 
both notions lie the level-1 networks [16, 26], without biconnected components with 



more than one hybrid node. We have studied the relationships among these types 
of networks [35]: see Section 3 below. 

Now, various algorithms are known for reconstructing galled trees from either 
sequences [18-21], trees [33], distances [15], splits [23], or triplets [24], and metrics 
provide a safe way to assess phylogenetic reconstruction methods [28,32]. A few 
polynomial time computable metrics, like the path multiplicity or ^-distance [14] 
or Nakhleh's metric m for reduced networks [31], are known for tree child evolu- 
tionary networks [14], which include galled trees [35]. But most distance measures 
introduced so far were only known to be metrics on time consistent [4] tree-child 
phylogenetic networks, including the Robinson-Foulds distance [3, 13, 10], the tri- 
partitions distance [13, 29], the nodal and splitted nodal distances [9, 11], and the 
triplets distance [11]. Since galled trees need not be time consistent, it was not 
known whether these distance measures define metrics for galled trees. On the 
other hand, Nakhleh gave in his PhD Thesis [30] two metrics for time consistent 
galled trees (based on splits and subtrees), but they are not metrics for arbitrary 
galled trees [14]. Recent simulation studies using the coalescent model with re- 
combination show that only a small fraction of the simulated galled trees are time 
consistent [1]. 

In this paper, we study which of the aforementioned metrics for tree-child time 
consistent phylogenetic networks are also metrics for galled trees, under the various 
notions of the latter. We show that the Robinson-Foulds distance is only a metric 
in the binary case (in which the original galled trees, the level- 1 networks and the 
1-nested networks are the same objects); the tripartitions distance is a metric for 
1-nested networks without any restriction on the degrees of their nodes (besides 
the general restriction that hybrid nodes have in-degree 2); and the splitted nodal 
distance is a metric in the semibinary (hybrid nodes of in-degree 2 and out-degree 
1) case, in which the 1-nested and level-1 conditions define the same objects, but 
they are strictly weaker than the node-disjoint reticulation cycles condition). On 
the other hand, neither the nodal distance nor the triplets distance are metrics 
even for the most restrictive case of binary galled trees. 

2 Preliminaries 

Given a set S, a S-rDAG is a rooted directed acyclic graph with its leaves bijec- 
tively labeled in S. 

A tree node of a S*-rDAG N = (V,E) is a node of in-degree at most 1, and a 
hybrid node is a node of in-degree at least 2. A tree arc (respectively, a hybridization 
arc) is an arc with head a tree node (respectively, a hybrid node). A node v € V 
is a child of u € V if (u, v) € E; we also say in this case that u is a parent of v. 
Two nodes are sibling when they have a common parent. 

We denote by u~^>v any path in N with origin u and end v. Whenever there 
exists a path u~^>v, we shall say that v is a descendant of u and also that u is an 
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ancestor of v. The length of a path is its number of arcs, and the distance from a 
node u to a descendant v of it is the length of a shortest path u~^>v. 

A node v is a strict descendant of a node -u in iV when every path from the root 
of -/V to v contains the node u; thus, v is a non- strict descendant of u when it is 
a descendant of u, but there exist paths from the root to v that do not contain u. 
The following straightforward result, which is Lemma 1 in [10], will be used often, 
usually without any further notice. 

Lemma 1. Every strict ancestor of a node v is connected by a path with every 
ancestor of v. □ 

A tree path is a path consisting only of tree arcs, and a node v is a tree descen- 
dant of a node u when there is a tree path u~^v. The following result summarizes 
Lemma 3 and Corollary 4 in [13], and it will also be used many times in this paper 
without any further notice. 

Lemma 2. Let u-^v be a tree path in a S-rDAG. 

(1) Every other path w ~^ v ending in v either is contained in u-^v or contains 
u~^>v. In particular, if w is a descendant of u and there exists a path w-^v, 
then this path is contained in the tree path u^v. 

(2) The tree path w^>v is the unique path from u to v. 

(3) The node v is a strict descendant of u. □ 

Two paths in a S'-rDAG are internally disjoint when they have disjoint sets of 
intermediate nodes. A reticulation cycle for a hybrid node h is a pair of internally 
disjoint paths ending in h and with the same origin. Each one of the paths forming 
a reticulation cycle for h is called generically a merge path, their common origin is 
called the split node of the reticulation cycle, and the hybrid node h, its end. The 
intermediate nodes of a reticulation cycle are the intermediate nodes of the merge 
paths forming it. 

A subgraph of an undirected graph is biconnected when it is connected and it 
remains connected if we remove any node and all edges incident to it. A subgraph 
of a S-rDAG N is said to be biconnected when it is so in the undirected graph 
associated to N. 

3 1-nested networks 

In the rest of this paper, by a hybridization network on a set S we understand 
a S'-rDAG without out-degree 1 tree nodes and with all its hybrid nodes of in- 
degree 2. We shall also use the term hybridization network with n leaves to refer 
to a hybridization network on a set S with n elements. A phylogenetic tree is a 
hybridization network without hybrid nodes. 
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We shall say that a hybridization network is semibinary when its hybrid nodes 
have out-degree 1, and that it is binary when it is semibinary and its internal tree 
nodes have out-degree 2. 

A hybridization network is: 

— a galled tree, when every pair of reticulation cycles have disjoint sets of nodes 
[19]- 

— 1-nested, when every pair of reticulation cycles have disjoint sets of arcs: by 
[35, Prop. 12], this is equivalent to the fact that every pair of reticulation cycles 
for different hybrid nodes have disjoint sets of intermediate nodes, and hence it 
corresponds to the notion of nested (hybridization) network with nesting depth 
1 [25,27]. 

— level-1, when no biconnected component contains more than 1 hybrid node [16, 
26]. 

To simplify the language, from now on we shall write simply 1-nested network to 
mean a 1-nested hybridization network. The following two results summarize the 
main results on 1-nested networks proved in [35]. 

Lemma 3. In a 1-nested network, every hybrid node is the end of exactly one 
reticulation cycle, and all the intermediate nodes of this reticulation cycle are of 
tree type. □ 

Theorem 1. (a) Every 1-nested network is tree-child, in the sense that every 
internal node has a child of tree type. 

(b) For general hybridization networks, 

galled tree ==> level-1 ==> 1-nested, 

and these implications are strict. 

(c) For semibinary hybridization networks, 

galled tree ==> level-1 1-nested, 

and the first implication is strict. 

(d) For binary hybridization networks, 

galled tree level-1 1-nested. □ 

The fact that every 1-nested network is tree-child implies, by [13, Lem. 2] the 
following result. 

Corollary 1. Every node in a 1-nested network has some tree descendant leaf, 
and hence some strict descendant leaf. □ 

The following result lies at the basis of most of our proofs. 
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Proposition 1. Every 1-nested network contains some internal tree node with 
all its children tree leaves, or a hybrid node with all its children tree leaves and 
such that all the intermediate nodes in its reticulation cycle have all their children 
outside the reticulation cycle tree leaves. 

Proof. Let JV be a 1-nested network. Let the galled-length of a path in N be 
the number of reticulation cycles which the arcs of the path belong to, and the 
galled-depth of a node in N the largest galled-length of a path from the root to 
it. Notice that the galled-depth of a hybrid node is equal to the galled-depth of 
the intermediate nodes of its reticulation cycle (because every arc in N belongs at 
most to one reticulation cycle). 

Assume that N does not contain any internal tree node with all its children tree 
leaves. Let h be a hybrid node of largest galled-depth in TV, and let v denote either 
h or any intermediate node in the reticulation cycle K for h. It turns out that 
v has no hybrid descendant other than h, because any path from v to any other 
hybrid node h 1 ^ h would contain arcs belonging to at least one more reticulation 
cycle, making the galled-depth of h! larger than that of v. 

Let v' be any descendant of v not belonging to K. Then, v' is a tree node and 
all its descendants are tree nodes, and therefore, since we assume that N does not 
contain any internal tree node with all its children tree leaves, we conclude that v' 
is a tree leaf. 

4 Reductions for 1-nested networks 

We introduce in this section a set of reductions for 1-nested networks. Each of 
these reductions, when applied to a 1-nested network with n leaves and m nodes, 
produces a 1-nested network with at most n leaves and less than m nodes, and 
given any 1-nested network with more than one leaf, it is always possible to apply 
to it some of these reductions. We shall also show that suitable subsets of these 
reductions have similar properties for binary and for semibinary 1-nested networks. 
Similar sets of reductions for other types of evolutionary networks have already 
been published [7, 11]. 

The R reductions. Let iV be a 1-nested network with n leaves, and let u be an 
internal node whose children are exactly the tree leaves i and j. The Rij reduction 
of N is the network Ri-j(N) obtained by removing the leaves i and j, together with 
their incoming arcs, and labeling with i their former common parent u, which has 
become now a leaf; cf. Fig. I. 3 It is clear that Ri-j(N) is a 1-nested network on 

5 \ {j}, and it has 2 nodes less than N. 

The T reductions. Let N be a 1-nested network with n leaves, and let u be an 
internal node with two tree leaf children i, j and at least some other child. The T^j 

3 In graphical representations of hybridization networks, we shall represent hybrid nodes by 
squares, tree nodes by circles, and indeterminate (that is, that can be of tree or hybrid type) 
nodes by pentagons. 



5 



Fig. 1. The Rij reduction. 



reduction of N is the network Ti-j(N) obtained by removing the leaf j together 
with its incoming arc; cf. Fig. 2. It is clear that Ti-j(N) is a 1-nested network on 
S \ {j} with 1 node less than N. 




Fig. 2. The reduction. 



The G reductions. Let N be a 1-nested network with n leaves. Assume that N 
contains a reticulation cycle K consisting of two merge paths (u, v±, . . . , v^, h) and 
(u, v[, . . . ,v' k ,,h), with k ^ k! (k 1 can be 0, in which case the corresponding merge 
path is simply the arc (u, h), but then k > 0), such that 

— the hybrid node h has only one child, and it is the tree leaf i; 

- each intermediate node of K has only one child outside K, and it is a tree leaf: 
the child outside K of each vj is the leaf ij and the child outside K of each v'j 
is the leaf ij. 

Notice that u may have children outside K. 

The ,-, ,•/ reduction of N is the network G,.;, ,-, .,■/ ,■/ (N) obtained 

by removing the nodes v\, . . . ,Vk,v[, . . . ,v' k ,h and the leaves i±, . . . , i/., i!-y , . . . , i',, , i, 
together with all their incoming arcs, and then adding to the node u two new 
tree leaf children, labeled i and ii; cf. Fig. 3. Since we remove a complete retic- 
ulation cycle and all descendants of its intermediate nodes, and we replace it 
by two tree leaves, it is clear that Ga.a i # (N) is a 1-nested network on 

S \ {12, ■ ■ ■ , ik, i'l-, ■ ■ ■ , i'k'} (and in particular, if k = 1 and k! = 0, it has the same 
leaves as N) with 2(k + k') nodes less than N. 

The G reductions. The G,-.,-. .-. ..-/ .-/ reduction is the same as G,.,-. , 
except for the fact that in order to apply the G i;ilj ...,j fc; i' 1 , reduction, the hybrid 
node h must be the leaf labeled i, instead of the leaf's parent: see Fig. 4. Then, 
^ ! i;ii,...,i fc ;ii,...,i',(-^') i s a 1-nested network on S \ {12, ■ ■ ■ , ik, i±, ■ ■ ■ , i').'} and it has 
2(k + k') — 1 nodes less than N. 



6 




Fig. 3. The <7 i;i| , 



reduction. 
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Fig. 4. The G v 




reduction. 



Remark 1. In the G and G reductions, we leave two tree leaves attached to the 
former split node of the removed reticulation cycle in order to ensure that their 
application never generates an out-degree 1 tree node, while avoiding to increase 
unnecessarily the number of reductions. 

Now we have the following basic applicability results. 

Proposition 2. Let N be a 1-nested network with more than one leaf. Then, at 
least one R, T , G, or G reduction can be applied to N , and the result is a 1-nested 
network. 

Proof. If N contains some internal node v with at least two children that are tree 
leaves, say i and j, then we can apply to N the Ri-j reduction, if the out-degree of 
v is 2, or the T^j reduction, if its out-degree is greater than 2. 

Assume now that N does not contain any internal node with more than one 
tree leaf child: in particular, it does not contain any internal tree node with all its 
children tree leaves. Then, by Proposition 1, it contains a hybrid node h with all 
its children tree leaves (and therefore, by the current assumption on N, h either 
is a leaf itself or has out-degree 1), and such that all the intermediate nodes in its 
reticulation cycle K have all their children outside K tree leaves (and therefore 
each one of them has exactly one child outside K, by Lemma 3 and the current 
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assumption on TV): let i±, . . . ,1/. and i±, . . . , i',, , with k ^ k', be the tree leaf chil- 
dren of the intermediate nodes of the two respective merge paths of K, listed in 
descending order of their parents along the path. Then, if h has out-degree 1 and 
its child is the tree leaf i, we can apply to N the Gi;^,...,^;^,...,^, reduction, while 
if h is the leaf i, we can apply to N the G.-.,-, ,-, .,-/ reduction. 

The fact that the result of the application of a R, T, G, or G reduction to N 
is again a 1-nested network has been discussed in the definition of the reductions. 

Corollary 2. Let N be a semibinary 1-nested network with more than one leaf. 
Then, at least one R, T, or G reduction can be applied to N, and the result is a 
semibinary 1-nested network. 

Proof. Since N does not contain hybrid leaves, we cannot apply to it any G re- 
duction, and therefore, by Proposition 2, we can apply to it at least one R, T, or 
G reduction. 

Now, if we can apply a Ri-j or Tj y - reduction to N, the common parent of the 
tree leaves i and j is a tree node, and if we can apply a Gj.j 1; _ ..^-i' reduction, 
the split node of the reticulation cycle for the hybrid parent of i is a tree node (in 
both cases because hybrid nodes in N have out-degree 1), and therefore neither 
application produces a hybrid node of out-degree different from 1. 

Corollary 3. Let N be a binary 1-nested network with more than one leaf. Then, 
at least one R or G reduction can be applied to N, and the result is a binary 
1-nested network. 

Proof. Since N does not contain nodes with out-degree greater than 2, we cannot 
apply to it any T reduction, and thus, by Corollary 2, we can apply to it some R 
or G reduction. 

Now, if we apply a R reduction to N, we replace an internal tree node with 
two tree leaf children by a tree leaf, and the result is again binary. And if we apply 
to N a G reduction, the split node of the reticulation cycle we remove is, as in the 
semibinary case, a tree node, and in this case moreover without any child outside 
the reticulation cycle (because its out-degree must be 2), and after the application 
of the reduction it is still a tree node of out-degree 2. 

We shall call the inverses of the R, T, G, and G reductions, respectively, the 
G _1 , and G expansions, and we shall denote them by R^j, T^ 1 , 

GT l . ., ., , and G,-./, .-/ . More specifically, for every 1-nested network 

N: 

— UN contains a leaf labeled with i but no leaf labeled with j, then the R~j 
expansion can be applied to N, and R~j(N) is obtained by unlabeling the leaf 
i and adding to it two tree leaf children labeled i and j; 
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— if N contains a tree leaf labeled with i that has some sibling, but no leaf labeled 
with j, then the T~} expansion can be applied to N, and T~^(N) is obtained 
by adding to the parent of the leaf i a new tree leaf child labeled with j; 

— UN contains an internal node u with two tree leaf children but no leaf 
labeled with 12, ■ ■ ■ , ik, i'\ , ■ ■ ■ , i'ui (with k ^ k'), then the G~- . .., ., expan- 

sion can be applied to N, and G~ l . . ., ., (N) is obtained by removing the 

leaves i, ii and their incoming arcs, and then starting in u two new internally 
disjoint paths with k and fc', respectively, intermediate nodes and ending in 
the same hybrid node h, and then adding to each intermediate node of these 
paths one new tree leaf and labeling these leaves (in descending order along 
the paths) with i±, . . . , and i' k ,, respectively, and finally adding to h a 

new tree leaf child labeled with i; 

— the application condition for the Gs.s iu ;> expansion is exactly the same 

as for G~l , , , and G~J 1 ^ if • ? (N) is as G~} , , (N), except 
that the new hybrid node is itself a leaf labeled with i. 

^,From these descriptions we easily see that the result of a IT 1 , T _1 , or G 
expansion applied to a 1-nested network is always a 1-nested network. 

The following result is easily deduced from the explicit descriptions of the 
reductions and expansions. 

Lemma 4. Let N and N' be two 1-nested networks. If N = N' , then the result of 
applying to both N and N' the same R^ 1 expansion (respectively, T _1 expansion, 
G^ 1 expansion or G expansion) are again two isomorphic 1-nested networks. 

Moreover, if we apply a R, T, G, or G reduction to a 1-nested network N, then 
we can apply to the resulting network the corresponding inverse R~ l , T _1 , G^ 1 , 
or G expansion and the result is a 1-nested network isomorphic to N. □ 

5 Proving metrics through reductions 

Let C be throughout this section a class endowed with a notion of isomorphism =. 
A metric on C is a mapping 

d : C x C -> R 
satisfying the following axioms: for every A,B,C G C, 

(a) Non-negativity: d(A, B) ^ 0; 

(b) Separation: d(A, B) = if and only if A = B; 

(c) Symmetry: d(A, B) = d(B, A); 

(d) Triangle inequality: d(A, C) ^ d(A, B) + d(B, C). 

A metric space is a pair (X,d) where X is a set and d is a metric on X, taking 
as the notion of isomorphism in X the equality (that is, replacing = by = in the 
separation axiom). 
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All distances for hybridization networks considered in this paper are induced 
through representations, in the following sense. A representation of C in a metric 
space (X, d) is a mapping 

F : C -» X 

such that if A = B, then F(A) = F(B). 

Given such a representation, the distance induced by d through F is the map- 
ping 

d F :C xC 

defined by d F (A,B) = d(F(A),F(B)), for every A,B e C. 

The metric axioms for d imply that this mapping is non-negative, symmetric, it 
sends pairs of isomorphic members of C to 0, and it satisfies the triangle inequality. 
So, to be a metric on C, dp only needs to satisfy that d F (A, B) = implies A = B. 
Now, it is straightforward to prove the following result (cf. [12, Prop. 1]). 

Lemma 5. The mapping d F is a metric on C if, and only if, it is injective up to 
isomorphism, in the sense that, for every A,B G C, if F{A) = F(B), then A = B. 

□ 

Reductions as those introduced in the last section can be used to prove the 
injectivity up to isomorphism of a representation F and hence, as a consequence, 
that the corresponding dp is a metric; it was done for specific classes C of evolu- 
tionary networks and specific metrics in [7, 11]. Since we shall use several times 
this kind of proofs in this paper, we make explicit here their general outline and 
the lemma they rely on. 

Let Cs\m denote a class of 1-nested hybridization networks of some specific 
type on a given set S' and with at most m nodes, and let Cs> = Um>|S'| Cs',m- 
Assume we have a set of reductions R\ , . . . , R s that can be applied to members of 
C$>, with inverse expansions i?^ 1 , . . . , Rj 1 - Consider the following conditions on 
these reductions and expansions: 

(Rl) For every N <G Cg', m with \S'\ ^ 2, there exists some reduction Ri that can be 
applied to N. 

(R2) For every N € Cs', m and for every reduction Ri{N) € Cg? mj for some 
S[ C S' and mj < m; moreover, S[ and rrii only depend on S', m and Ri, not 
on N. 

(R3) For every iV £ Cs\ m an d for every reduction Ri, if Ri can be applied to N, 

then Rr 1 can be applied to Ri{N) and Rr 1 (R i (N)) ^ N. 
(R4) For every reduction Ri and for every N, N' G C s > m . such that iV = N' , if the 

corresponding expansion RJ 1 can be applied to N , then it can also be applied 
to N' and the resulting networks are isomorphic. 

The definitions and results given in Section 4 imply that: 

— The set of all R and G reductions satisfy conditions (Rl) to (R4) for the classes 
Cs> of all binary 1-nested hybridization networks on a set S' . 
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— The set of all R, T, and G reductions satisfy conditions (Rl) to (R4) for the 
classes Cs> of all semibinary 1-nested hybridization networks on a set S'. 

- The set of all R, T, G, and G reductions satisfy conditions (Rl) to (R4) for 
the classes C,s> of all 1-nested hybridization networks on a set S'. 

Now, we have the following result. 

Lemma 6. Let S be a given set of labels. For every S' C S, let Cs', m and R±, . . . ,R S 
be as above, and assume that these reductions satisfy conditions (Rl) to (R4)- For 
every S' C S, let F$> : Cs> — > Xs> be a representation in a metric space (Xs>,d^ s )). 

Then, F$ is injective up to isomorphism if the following two conditions are 
satisfied for every S' C S, for every m ^ \S'\, for every reduction Ri, and for 
every N, N' G C s >, m such that F S >(N) = F S >(N'): 

(A) If Ri can be applied to N, then it can also be applied to N' . 
(R) IfRi is applied to N and N', then F s >(Ri(N)) = F s >(Ri(N')). 

( s) 

In particular, if these two conditions are satisfied, then d F J is a metric on Cs- 
Proof. We shall prove by induction on \S'\ + m the following statement: 

For every S' C S and m ^ \S'\, if N,N' £ Cs\ m ar e such that Fs'(N) = 
F S ,(N'), then N ^ N'. 

The starting case, when \S'\ + m = 2, is obvious because then S' must be a 
singleton, and there is, up to isomorphism, only one 1-nested hybridization network 
on a given singleton {i}: a single node labeled with i. 

Let now N,N' € C s >, m be such that F S ,{N) = F S >(N') and \S'\ + m ^ 3. 
If I S' I = 1, we reason as in the starting case to deduce that N = N', so we 
assume that \S'\ ^ 2. By (Rl), some reduction Ri can be applied to N, and 
since F S >(N) = F S ,(N'), by (A) it can also be applied to N'. Then, by (R2), 
Ri(N), Ri(N') G Cs' mi with C S" and nii < m, and by (R) we have that 
F S /(Ri(N)) = F S i(Ri(N')). Therefore the induction hypothesis applies, implying 

that Ri(N) ^ Ri(N'). But then, by (R3), R^ 1 can be applied to Ri(N) and Ri(N') 
and R^iRiiN)) ^ iV and Rr 1 (R i (N')) TV', while, by (R4), Rr 1 (R i (N)) ^ 
R^iRiiN')). This implies that TV ^ AT', as we wanted to prove. 

Thus, in particular, we have that for every m ^ if N, N' G C5 im are such 
that F s (iV) = Fs(N'), then TV ^ TV'. Now notice that if N, N' € C 5 ,'then there 
exists some m such that N, N' G Cs jm : take as m the largest number of nodes in 
or in N'. Therefore, Fs is injective up to isomorphism, as we claimed. 

Remark 2. If one wants to use a result like the last lemma to prove the injectivity 
up to isomorphism of a certain representation of S'-rDAGs more general than 1- 
nested networks, then it may be necessary to explicitly add to (A) and (R) a third 
condition that covers the starting case: 
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(S) For every i G S, F^y is injective up to isomorphism. 

We shall also use a couple of times the following straightforward fact. 

Lemma 7. Let F : C — > X and F' : C — > X' be two representations of C in metric 
spaces (X,d) and (X',d'), and assume that F(A) = F(B) implies F'(A) = F'(B). 
Then, if F' is injective up to isomorphism, so is F . □ 

When the hypothesis of this lemma is satisfied, we say that F refines F', and 
also that dp refines d' F ,. Notice that if d' F , is a metric and dp refines it, then it is 
also a metric. 

6 Robinson-Foulds distance 

Let N = (V, E) be a S'-rDAG. For every node v G V, the cluster of v in N is the 
set C(ii) C S of leaves that are descendants of v. The cluster representation of N 
is the multiset 

C(N) = {C N (v) \veV}, 

where each member appears with multiplicity the number of nodes having it as 
cluster. In particular, the cardinal of C(N) (as a multiset, that is, every element 
counted with its multiplicity) is equal to the number of nodes in N. 
The Robinson-Foulds distance between a pair of S'-rDAGs N, N' is 

d RF (N,N') = \C(N)AC(N% 

where the symmetric difference and its cardinal refer to multisets. It is the nat- 
ural generalization to S'-rDAGs of the well known Robinson-Foulds distance for 
phylogenetic trees [34]. 

Remark 3. If v is an ancestor of u in N, then C(u) C C(v), but the converse 
implication is false, even in binary galled trees. See, for instance, Fig. 7 below: in 
both networks, the root and its tree child have the same cluster, but the root is 
not a descendant of its child. 

It is known that the Robinson-Foulds distance is a metric on the class of all 
regular evolutionary networks on a given set S (the networks iV such that the 
mapping v t— > C(v) induces an isomorphism of directed graphs between iV and 
the Hasse diagram of (C(N), D)) [3] and on the class of all tree-child phylogenetic 
networks on a given set S that do not contain any hybrid node with two parents 
connected by a path [13]. Unfortunately, 1-nested networks, or even binary galled 
trees, need not be regular (by Remark 3) and they can contain reticulation cycles 
where one merge path is a single arc. So, we cannot use those results to prove that 
the Robinson-Foulds distance is a metric, even on the class of all binary galled 
trees. 
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As a matter of fact, the cluster representation is not injective up to isomor- 
phism, and hence the Robinson-Foulds distance is not a metric, for 1-nested net- 
works, or even galled trees, unless we restrict the possible in- and out-degrees of 
their nodes: they cannot contain either internal tree nodes of out-degree other than 
2 (see Fig. 5), or hybrid nodes of out-degree (see Fig. 6) or greater than 1 (see 
Fig. 7). Therefore, the Robinson-Foulds distance can only be a metric for binary 
1-nested networks, that is, for binary galled trees. Now, we have the following 
result. 




Fig. 5. Two non- isomorphic galled trees with the same cluster representation and internal tree 
nodes of out-degree 3. 




Fig. 6. Two non-isomorphic galled trees with the same cluster representation and hybrid nodes 
of out-degree 0. 




Fig. 7. Two non-isomorphic galled trees with the same cluster representation and hybrid nodes 
of out-degree 2. 
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Theorem 2. Let N, N' be two binary 1-nested networks on a given set S such 
that C(N) = C(N'). 

(A) If a specific R or G reduction can be applied to N , then it can also be applied 
to N'. 

(R) If a specific R or G reduction is applied to N and N' , the resulting networks 
have the same cluster representations. □ 

In order not to lose the thread of the paper, we postpone the proof of this 
theorem until §A2 in the Appendix at the end of the paper. Combining Lemma 6 
with this theorem, we obtain the following result. 

Corollary 4. The Robinson-Foulds distance is a metric on the class of all binary 
galled trees on a given set S. □ 



7 Tripartitions distance 

Let N = (V, E) be a S-rDAG. For every node v G V, let A(v) C S be the set of 
(labels of) strict descendant leaves of v and B(v) = C(v) \ A(v) the set of non- 
strict descendant leaves of v; B(v) may be empty, but A(v) ^ by Lemma 3. The 
tripartition associated to v [29] is 

e(v) = (A(v),B(v),S\C(v)). 

Notice that the tripartition associated to a node v refines its cluster C(v), by 
splitting it into A(v) and B(v). 

The tripartitions representation of N is the multiset 

8(N) = {6{v) j v G V} 

of tripartitions of the nodes of N. The tripartitions distance between a pair of 
S-rDAGs N, N' is 

d tri (N,N') = \0(N) A 9(N')\, 

where the symmetric difference and its cardinal refer to multisets. 

It turns out that the tripartitions distance is a metric on the class of all 1- 
nested networks on a given set. It is a consequence of the following proposition, 
whose proof we postpone until §A3 in the Appendix. 

Theorem 3. Let N, N' be two 1-nested networks on a given set S such that 
0(N) = 0(N'). 

(A) If a specific R, T, G, or G reduction can be applied to N , then it can also be 
applied to N' . 

(R) If a specific R, T, G, or G reduction is applied to N and N', the resulting 
networks have the same tripartitions representations. □ 
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So, using Lemma 6, we deduce the following result. 

Corollary 5. The tripartitions distance is a metric on the class of all 1-nested 
networks on a given set S. □ 

Remark 4- Another refinement (in the sense of Lemma 7) of the Robinson- Foulds 
distance, the so-called fx-distance, was introduced by Cardona et al [14] and proved 
to be a metric on the class of all tree-child S-rDAGs for any given S: then, in 
particular, it is a metric on the class of all 1-nested networks on a set S. Soon later, 
L. Nakhleh [31] proposed a distance m that turned out to refine the //-distance [12] 
and therefore that is also a metric on the class of all 1-nested networks on a set 
S. The interested reader can look up the aforementioned references for the specific 
definitions of these metrics. 

8 Nodal and splitted nodal distances 

Let N = (V, E) be a S'-rDAG; to simplify the language, throughout this section we 
assume that S = {1, . . . , n} with n = \S\. Recall from [10] that the least common 
semistrict ancestor, LCSA for short, of a pair of nodes u, v G V is the node that is 
a common ancestor of u and v and strict ancestor of at least one of them, and that 
is a descendant of all other nodes in N satisfying these properties. Such a LCSA 
of a pair of nodes u, v always exists and it is unique [10, §IV], and we shall denote 
it by [u,v]. 

The LCSA of a pair of nodes in a phylogenetic tree is their lowest common 
ancestor. It turns out that such a characterization extends to 1-nested networks. 
Recall that a lowest common ancestor, LCA for short, of a pair of nodes u, v in 
a rDAG is any common ancestor of u and v that is not a proper ancestor of any 
other common ancestor of them [5] . 

Lemma 8. Every pair of nodes u,v in a 1-nested network has only one LCA, and 
it is their LCSA. 

Proof. Let x be any LCA of u and v, and let us prove that x must be a strict 
ancestor of u or v. Indeed, by Lemma 9 in §Al in the Appendix, if x is not a strict 
ancestor of u, then it is intermediate in the reticulation cycle for a hybrid node 
h u that is a strict ancestor of u. In a similar way, if x is not a strict ancestor of v, 
then it is intermediate in the reticulation cycle for a hybrid node h v that is a strict 
ancestor of v. Now, if x were not a strict ancestor either of u or of v, then it either 
would happen that it is intermediate in reticulation cycles for two different hybrid 
nodes, which is impossible in a 1-nested network, or that it is a proper ancestor of 
a common ancestor of u and v, namely h u = h v , against the assumption that x is 
a LCA of u and v. 

So, x is a common ancestor of u and v and a strict ancestor of at least one of 
them, and thus it is an ancestor of [u,v]. Since x cannot have proper descendants 
that are common ancestors of u and v, we conclude that x = [u, v]. 
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For every pair of leaves i,j £ S, let £]y(i,j) and £nU, be the distances from 

to i and to j, respectively, and let VN(i,j) = £N{hj) + £nU, i)- 
The LCSA-path lengths matrix of N is the symmetric matrix 



u(N) 



MM) ... u N (l,n)\ 



v N (n,l) ... v N (n,n)J 



and the splitted LCSA-path lengths matrix of N is the (not necessarily symmetric) 
matrix 

/MM) ...Mi.«)\ 

\Mn, 1) . . . £ N (n,n) J 

The noda/ distance between a pair of S'-rDAGs N, N' is half the Manhattan, 
or Li, distance between v(N) and u(N'): 

d„(N,N') = - MU) - v N >(i,j)\. 

The splitted nodal distance between TV and N' is the Manhattan distance between 
£(N) and £{N'): 

d e (N,N')= Yl \tN(i,j)-£ N >(i,j)\. 

Of course, instead of using the Manhattan distance on the set of n x n matrices, 
one can use any other distance for real-valued matrices to compare LCSA-path 
lengths, or splitted LCSA-path lengths, matrices, like for instance the euclidean 
distance. The results in this section do not depend on the actual metric for real- 
valued matrices used. 

The nodal distance d v is the natural generalization to 5-rDAGs of the clas- 
sical nodal metric for binary phylogenetic trees [17,38], while the splitted nodal 
distance di generalizes to S'-rDAGs the recently introduced homonymous metric 
for arbitrary phylogenetic trees [8]. 

It is known [9, 11] that d v is a metric on the class of all binary tree-child time 
consistent phylogenetic networks on a given set S, and that da is a metric on the 
class of all tree-child time consistent phylogenetic networks on a given set S, but no 
binary galled tree containing a reticulation cycle with one merge path consisting of 
a single arc is time consistent, and therefore we cannot use these results to prove 
that d v or d^ are metrics even for binary galled trees. 

It turns out that u is not injective up to isomorphism, and hence d u is not a 
metric, even for binary galled trees, as Fig. 8 shows. As far as £ goes, it is not 
injective up to isomorphism for 1-nested networks, or even galled trees, that are 
not semibinary: if we allow hybrid nodes of out-degree (see Fig. 9) or greater 
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Fig. 8. Two non-isomorphic binary galled trees on S = {1,2} and whith the same LCSA-path 
length, 3, between their only two leaves. 

than 1 (see Fig. 10), there exist pairs of non-isomorphic galled trees with the same 
splitted LCSA-path length matrices. Therefore, dg can be a metric at most on the 
class of all semibinary 1-nested networks. Now, we have the following result. 




Fig. 9. Two non-isomorphic galled trees with the same I matrix and hybrid nodes of out-degree 
0. 




Fig. 10. Two non-isomorphic galled trees with the same i matrix and hybrid nodes of out-degree 
greater than 1. 



Theorem 4. Let N, N' be two semibinary 1-nested networks on a given set S such 
thatt(N) =£(N'). 

(A) If a specific R, T, or G reduction can be applied to N, then it can also be 
applied to N' . 

(R) If a specific R, T, G reduction is applied to N and N' , the resulting networks 
have the same splitted LCSA-path lengths matrices. □ 

As we did previously, we postpone the proof of this theorem until §A4 in the 
Appendix at the end of the paper. Combining Lemma 6 with this theorem, we 
obtain the following result. 
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Corollary 6. The spirited nodal distance is a metric on the class of all semibinary 
1-nested networks on a given set S. □ 



9 Conclusion 

Several slightly different definitions of galled tree, capturing the notion of a hy- 
bridization network with isolated reticulation cycles, have been proposed so far in 
the literature. The most general such definition is as a network with arc-disjoint 
reticulation cycles [16, 26], called in this paper 1-nested, and the most restrictive is 
Gusfield et ai's original definition of a galled tree as a network with node-disjoint 
reticulation cycles [19]: in between lie the level-1 networks of Janson, Sung et al [25, 
27]. In the semibinary (hybrid nodes of in-degree 2 and out-degree 1) case, level-1 
and 1-nested networks are the same, and in the binary (semibinary plus tree nodes 
of out-degree 2) case, galled trees, level-1 networks and 1-nested networks are the 
same objects. 

In this paper we have established for which classes of 1-nested networks on a 
fixed set of labels, several distance measures introduced so far in the literature sat- 
isfy the axioms of metrics: actually, only the separation axiom (distance means 
isomorphism) is relevant here, because all other axioms of metrics are always sat- 
isfied by these distances. In summary, we have proved that: 

(a) The Robinson-Foulds distance [3, 10] is a metric only for binary galled trees. 

(b) The tripartitions distance [29], the /i-distance [14] and Nakhleh's metric m for 
reduced networks [31] are metrics for arbitrary 1-nested networks. 

(c) The natural translation of the nodal distance for phylogenetic trees to evolu- 
tionary networks [9] is not a metric even for binary galled trees. 

(d) The splitted nodal distance [9, 11] is a metric for semibinary 1-nested networks, 
but not for arbitrary galled trees. 

We would like to mention that the 1-nested networks turn out to form the first well- 
defined class of evolutionary networks where the tripartitions distance is shown to 
be a metric but the Robinson-Foulds distance is not a metric. 

There are other distances that have not been discussed in this paper because 
they obviously fail to be metrics even for binary galled trees. This is the case of 
the triplets distance [11], which cannot be a metric for binary galled trees because 
there are many more binary galled trees with 3 leaves than possible triplets in the 
sense defined in the aforementioned paper. And, as it was already observed in [14, 
§11. D], it is also the case of any distance defined by comparing the multisets of 
induced subtrees, or the multisets of splits of induced subtrees: for instance, the 
pairs of galled trees depicted in Figs. 9 or 10 have the same multisets of induced 
subtrees. 

The splitted nodal distance and the triplets distance were introduced in [11] as 
suitable generalizations of the corresponding distances for phylogenetic networks 
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with the aim of obtaining metrics on the class of tree-child time consistent phyloge- 
netic networks, and hence they were not designed to cope with reticulation cycles 
where one merge path is a single arc. This is the main reason of their failure as 
metrics for arbitrary 1-nested networks. But it seems not difficult to modify them 
to obtain metrics for 1-nested networks, by taking into account the restricted, and 
specific, topological structure of these networks: something similar was already 
done with the splitted nodal distance to make it work on tree-child time consistent 
evolutionary networks with hybrid nodes of (almost) arbitrary type [9]. 

Galled trees, 1-nested networks, and level-1 networks are defined as having 
hybrid nodes of in-degree 2, in the first case by semantical reasons and in the 
other two cases for practical reasons (to guarantee that certain reconstruction 
algorithms run in polynomial time), and we have kept this restriction in this paper. 
But, although Gusfield et al's node-disjoint reticulation cycles condition implies 
that hybrid nodes must have in-degree 2, this restriction is not necessary in level- 
1 and 1-nested networks, and polynomial time algorithms for the reconstruction 
of level-1 or 1-nested networks with hybrid nodes of arbitrary in-degree may be 
discovered in the future, in which case it would be interesting to know whether the 
distance measures discussed in this paper define metrics in this more general case 
and they can be used thus to assess these new algorithms. 
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Appendix: Proofs of the main theorems 

Al Some lemmas on clusters and tripartitions 

We establish in this subsection some basic properties of clusters on 1-nested net- 
works that will be used in the proofs of the next two subsections. To simplify the 
notations, given a 1-nested network N on a set S, let Ci(N) denote the multiset of 
clusters of its internal nodes. Cj(N) is obtained by removing from C(N) one copy 
of every singleton {i} with i G S. 

Lemma 9. Let N be a 1-nested network on S. 

(a) For every i G S and for every internal node v, C(v) = {i} if, and only if, i is 
a tree leaf and v is its parent and it has out- degree 1. 

(b) If two leaves i,j are such that there does not exist any member of'Cj(N) con- 
taining one of them and not the other, then they are sibling. 

(c) Let v be a tree node and u its only parent. If C(u) ^ C(v), then C{u) is the 
only (up to multiplicities) minimal member ofCi(N) strictly containing C{v). 
If C(u) = C(v) and u has out-degree greater than 1, then u is the split node of 
a reticulation cycle such that one of the merge paths contains v as intermediate 
node and the other merge path is a single arc. 

(d) If a node v is a non-strict descendant of a node u, then u is intermediate in 
the reticulation cycle for a hybrid node that is a strict ancestor of v. 

Proof, (a) If v is a node with only one child and this child is the tree leaf i, then 
C(v) = {i} G Cj(N). Conversely, let v be an internal node such that C(v) = 
{i}. Since every internal node in N has a tree descendant leaf, i must be a tree 
descendant leaf of v (and in particular a tree leaf). Let w be the parent of i, and 
let us prove that it has out-degree 1. Indeed, if u is a child of w other than i, it 
has a tree descendant leaf j, and j / i, because, otherwise, the only parent w of i 
would be a descendant of its child u. But then j G C(v), against the assumption 
that C(v) = {i}. 

So, the tree path v i cannot have any intermediate node, because otherwise w 
would be intermediate in this path and hence it would be a tree node, but internal 
tree nodes in N have out-degree greater than 1. Therefore, v is the parent of i. 
But then, as we have just seen, it must have out-degree 1. 

(b) Assume that the every member of Cj(N) containing i or j contains both 
of them, but that i and j are not sibling. Let vi be a parent of i: then i G C(v\) 
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implies j G C(v\) and, since v\ is not a parent of j, v\ is a proper ancestor of some 
parent w\ of j. Then, j G C(iwi) implies i € C(iwi) and thus, since w\ is not a 
parent of i, w\ is a proper ancestor of some parent i>2 7^ t>i (because ui is a proper 
ancestor of w\) of i. Iterating this process, we obtain that V2 is a proper ancestor 
of another parent W2 7^ w\ of j, and then that u>2 is a proper ancestor of another 
parent v$ 7^ v±, V2 of i, which is impossible because every node in N has at most 2 
parents. 

(c) Let « be the parent of the tree node v. Assume that C(u) / C{v) and let i 
be a tree descendant leaf of v, and hence also of u. For every other internal node 
w, if C(v) C C(w), then z € C(w), and therefore either the path w-^i is contained 
in the path w^i or conversely. But C(v) C C(u>) implies that iu cannot be a 
descendant of v, and we conclude that u-^i is contained in the path w-^i, and 
hence -u is a descendant of w, which implies that C(u) C C(w). 

Assume now that u has out-degree greater than 1 and that C(u) = C(v). Let v' 
be another child of u and let j be a tree descendant leaf of v' . Then, j G C(v), and 
therefore either the path u-wj contains the path w'^j or the path v'-^j contains 
the path v-^j. But the last situation is impossible, because iff belongs to the path 
v' -w j, its only parent n should also belong to it, and « cannot be a descendant 
of its child v'. So, we conclude that v' is a descendant of v, and therefore that v' 
is a hybrid node and its reticulation cycle consists of the arc (u,v f ) and a path 
(u,v,... ,v'). 

(d) Assume that v is a non-strict descendant of u. Let u-^t) be any path from u to 
v, and r~^-u a path from the root r of N to u not containing u. Let u; be the first 
node in contained also in r ^v. Since, by assumption, w 7^ u and, clearly, 

w ^ r, w will have different parents in both paths, which implies that it is hybrid. 
Let now r u be any path from the root to u and let x be the last node in this 
path belonging to the subpath r^ioofr^»: again, u/i. Then, the subpath 
x ~» w of r ~^ « and the concatenation of the subpath x ~» u of r ~» tt and the 
subpath M-)roofu-tD are internally disjoint, and hence they form a reticulation 
cycle for w with split node x and having u as intermediate node. 

It remains to prove that w is a strict ancestor of v. But if it were not, then, as 
we have just seen, w would be intermediate in a reticulation cycle for an ancestor 
of v, which is impossible by Lemma 3. 

Lemma 10. Let N be a 1-nested network on S, let h be a hybrid node of N with 
C(h) = {i}, and let K be its reticulation cycle, with split node u. 

(a) No pair of intermediate nodes of K in different merge paths are connected by 
a path. 

(b) Every pair of intermediate nodes in K have different clusters, and different 
also from C(h). 

(c) The only non-strict descendant of each intermediate node of K is i. 

(d) The intersection of the clusters of any pair of intermediate nodes of different 
merge paths of K is {i}. 
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(e) i is a strict descendant of u. 

(f) If v is a node outside K such that i G C(v), then v is an ancestor of u and 
thus C(u) C C{v). 

(g) All clusters of intermediate nodes in K have multiplicity 1 in Ci(N), except 
the cluster of the child other than h of u when one of the merge paths consists 
of a single arc (u, h) . 

(h) The minimal elements ofCi(N) strictly containing C(h) are the clusters of the 
parents of h that are intermediate in K. 

Proof. By Lemma 9. (a), C(h) = {i} implies that either h = i or that % is a tree 
child of h, and its only child. 

(a) If x and y were two intermediate nodes of K belonging to different merge paths 
and there existed a path x-^y, then the first node in this path also belonging to 
the path u-^>y would have different parents in both paths, and therefore it would 
be hybrid, which is impossible by Lemma 3. 

(b) Let x and y be two different intermediate nodes of K: if they belong to the 
same merge path, we take them so that y is a proper descendant of x. We shall 
prove that C[x) ^ C(y). 

Since both nodes are of tree type, x has a child v outside K. Let I be a tree 
descendant leaf of v, and assume that I £ C (y) . Then, either the path v I contains 
the path y ~» I or vice versa. But the tree path u-^y contained in the merge path is 
the unique path from u to y, and it does not contain v, and therefore y cannot be 
a descendant of v. Thus, v is a descendant of y, and since x is not a descendant of 
y by (a), we conclude that v is a hybrid node such that its parent other than x is 
a descendant of y. But then, y is intermediate in the reticulation cycle of v, which 
is impossible because it is already intermediate in the reticulation cycle of h. So, 
we reach a contradiction that implies that I ^ C(y), and hence that C(x) ^ C(y). 

On the other hand, Lemma 9. (a) implies that, for every proper ancestor x of 
h, C{h) = {i} C C{x). 

(c) Let x be an intermediate node of K and I a descendant leaf of x other than i. If / 
were a non-strict descendant of x, then x would be intermediate in the reticulation 
cycle of a hybrid ancestor of I by Lemma 9.(d), which is impossible because x is 
already intermediate in K and h is not an ancestor of I. Thus, every descendant 
leaf of x other than i is a strict descendant of x. 

On the other hand, the fact that i is a non-strict descendant of x is obvious: 
the composition of any path r ~^ u with the merge path u-^h not containing x, 
and ending, if necessary, with the arc (h,i), yields a path r-^i not containing x. 

(d) Let x and y be two intermediate nodes of different merge paths of K. If there 
existed some leaf I ^ i in C(x) PI C(y), then it would be a strict descendant of both 
x and y by (c), which would imply by Lemma 1 that x and y are connected by a 
path, against (a). 

(e) Any path r i contains h and therefore it contains one of its parents. But 
the merge path from u to any parent of h is a tree path, and hence it must be 
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contained in the subpath r h of r -w i. This implies that it belongs to the path 

r~^i. 

(f) Let v be a node outside K such that i G C(v). Then, by (e) and Lemma 1, 1/ 
and v are connected by a path. Now, since will be an ancestor of one of 
the parents of h, say x. But then, if v were a descendant of u, it would belong to 
the only path u~^>x, which is contained in K, against the assumption that v does 
not belong to K. Thus, u is a descendant of v. 

(g) Let x be an intermediate node of K and assume that there exists some w ^ x 
such that C(w) = C(x). We know by (b) that w is neither h (because C(h) ^ C(x)) 
nor any intermediate node of K and therefore, by (f), C(u) C C(w) = C(x). Thus, 
C(x) contains all clusters of nodes in K, which implies that the merge path not 
containing x cannot contain any intermediate node (by (d)) and that x is the child 
of u in the only merge path of K of length greater than 1 (otherwise, the cluster 
of its parent in the merge path would strictly contain C(x), by (b), and would be 
included in C(u)). 

(h) Let v and v' be the parents of h. Since every proper ancestor w of h is an 
ancestor of v or v', and hence C(w) contains C(v) or C(v'), we deduce that C(v) 
and C(v') are the only possible minimal members of Ci(N) strictly containing 
C(h). 

Now, if C{v) and C(v') are two different such minimal members of Ci(N), then 
they do not contain each other and therefore v and v' are not connected by a path. 
This implies that neither v nor v' is the split node u of K, and therefore that they 
are intermediate in K. Conversely, if only one of these two clusters, say C(v), is 
minimal strictly containing C(h), then it is contained in the other. By (d), this 
implies that v' cannot be intermediate in K, and therefore v' = u. 

A2 Proof of Theorem 2 

To ease the task of the reader, we split the proof of Theorem 2 into several lem- 
mas. Throughout this subsection, N stands for a binary 1-nested network (or, 
equivalently, a binary galled tree) on a fixed set 5. 

Lemma 11. The R^j reduction can be applied to N if, and only if, {i,j} G Ci(N) 
but{i},{j} idiN). 

Proof. If N contains a node u whose children are the tree leaves i,j, then {i,j} = 
C{u) € Cj(AT), and {i}, {j} £ d(N) by Lemma 9. (a). 

Conversely, if {i,j} € Cr(iV) an d ^ Cj(iV), then i and j are tree 

leaves (by the binarity of N) and their parents have out-degree greater than 1 
by Lemma 9. (a). Let now u be the parent of i. Since every internal ancestor of i 
is an ancestor of its only parent u, the cluster of any internal ancestor of i must 
contain the cluster of u: in particular, i G C(u) C {i,j}, which implies (since 
{i} Ci(N)) that C(u) = {i, j}. But then, if i G C{v) for some internal node v, 
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then {i,j} C C(v). This shows that every member of Ci(N) that contains i also 
contains j. By symmetry, every member of Cj(iV) that contains j also contains i. 
Then, Lemma 9.(b) applies. 

Lemma 12. The Gj ; j lv .. j j fc; reduction can be applied to N if, and only if, the 
following conditions are satisfied: 

(1) {i}ed(N). 

(2) For every j = 2, . . . , k, {ij, . . . , i k , i} £ C(iV) uret/i multiplicity 1. 

(3) {ii, . . . ,ik,i} £ C(N) with multiplicity at least 2. 

(4) Any member ofCi(N) containing some label among i\, . . . ,ik and not listed in 
(2)-(3), must contain {ii, . . . , ik, i}. 

Proof. If N contains a reticulation cycle K consisting of the merge paths (u, h) 
and (u,v\, . . . ,Vk, h) (and hence h and v\ are the only children of u), such that the 
only child of the hybrid node h is the leaf % and the child outside K of each tree 
node Vj is the tree leaf ij, then 

C(h) = {i} 

C(vj) = {ij,. ..,i k ,i}, j = , k 
C(u) = {h, ...,i k ,i} 

and hence Cj(N) contains all clusters listed in (l)-(2), the latter with multiplicity 
1 by Lemma 10. (g), as well as the cluster given in (3) with multiplicity at least 
2. Now, let v be any internal node of not belonging to K and such that C(v) 
contains some label i\, . . . , ik- It ij € C(v), then i,'s only parent Vj must also be a 
descendant of v. But then i G C(vj) C C(v) implies that C(u) C C(v) by Lemma 
10. (f), as (4) claims. 

Conversely, assume that (1)— (4) are satisfied. Then, the parent h of i has out- 
degree 1 and therefore it is hybrid, and, by Lemma 10. (h), its parents are connected 
by a path, because there is only one minimal element of Cj(N) strictly containing 
{i}, namely {ik,i}. Therefore, the reticulation cycle K for h consists of an arc 
(u,h) and a tree path (u,v\, . . . ,vi,h) with I ^ 1. In this situation, Lemma 10 
implies that: 

— C(v{) is the minimal element of Ci(N) strictly containing {i}; 

_ C(vi) C C(vi-i) C ■•• C C(vi), and then, by Lemma 9.(c), each C{vj), j = 
1, . . . , I — 1, is the minimal element of Cj(N) containing C(vj+i); 

— C{v2), . . . ,C(vi) appear with multiplicity 1 in Ci(N); 

— C(u) = C{v\), because the only children of u are v\ and h. 

On the other hand, (l)-(4) imply that 

— The minimal element of Ci(N) strictly containing {i} is {ik,i}', 

— {ik,i} £ {ik-i,h,i} £ ••• £ {h,---, and each {ij, . . . ,i k ,i}, j = l,...,k- 
1, is the minimal element of Cj(N) strictly containing {ij+i, . . . ,ik,i}', 
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— {i2, ■ ■ ■ , ik, i}, ■ ■ ■ , {ik-, i} appear with multiplicity 1 in C/(iV); 

— ik, i} appears with multiplicity at least 2 in Cj(iV). 

The only possibility of making these two lists of properties compatible is that k = I 
and C(vj) = {ij, . . . , ik, i} for every i = 1, . . . , k. 

It remains to prove that the only child of every Vj outside K is the corresponding 
leaf ij. Let Wj be the only parent of ij; we want to prove that Wj = Vj. Since 
ij G C(vj), there exists a path Vj ~^Wj and hence C[wj) C C(vj). On the other 
hand, ij € C(wj) implies, by (4), that i G C(wj) and therefore, by Lemma 10. (f), 
either Wj belongs to K or it is an ancestor of u. The second case cannot hold, 
because Wj is a proper descendant of u. Therefore, Wj is a node of K that is a 
descendant of Vj and an ancestor of iy. it must be Vj. 

Lemma 13. The ...,i k -i' 1 ,... J i' , reduction, with k ^ k' > 0, can be applied to N 
if, and only if, the following conditions are satisfied: 

(1) {i}ed(N). 

(2) For every j = 1, . . . , k, {ij, . . . ,ik,i} G C(N) with multiplicity 1. 

(3) For every j = 1, . . . , k' , {ij, . . . , i' k ,, i} £ C(N) with multiplicity 1. 

(4) {h, . . . ,ik-i,ik,A, • • ■ ,*fc'_i,*fc',*} G C(iV). 

(^5j ylny member of Ci(N) containing some label among i\, . . . , ik, ■ ■ ■ , i' k , and 
not listed in (l)-(4), must contain {i±, . . . , ik, i'i, ■ ■ . , i' k ,, i}. 

Proof. The proof that if TV" contains a reticulation cycle K consisting of the merge 
paths {u, v±, . . . , Vk, h) and (u,v[, . . . , vL, h), with k ^ k' > 0, such that the only 
child of the hybrid node h is the leaf i, the child outside K of each tree node Vj 
is the tree leaf ij, and the child of each tree node v'- outside K is the tree leaf i\, 
then it satisfies conditions (1) to (5), is similar to the proof of the corresponding 
implication in the previous lemma, and we do not repeat it here. 

As far as the converse implication goes, assume that conditions (l)-(5) in the 
statement are satisfied. Then, the parent h of i has out-degree 1 and therefore it 
is hybrid, and, by Lemma 10. (h), its two parents are not connected by a path, 
because there are two minimal elements of Ci(N) strictly containing {i}, namely 
{ik,i} and {i' k /,i}- Therefore, the reticulation cycle K for h consists of two merge 
paths (u, v\, . . . , vi, h) and (u,v[, . . . , v[,, h) with 1,1' ^ 1. In this situation, Lemma 
10 implies that: 

— C(yi) and C(v' v ) are the minimal elements of Cj(N) strictly containing {i}; 

— C(vi) C ■ ■ • C C(v±), and then, by Lemma 9.(c), each C(vj), j = 1, — 1, 
is the minimal element of Ci(N) containing C(vj+i); 

— C{v' v ) C • • ■ C C(v[), and then, by Lemma 9.(c), each C(v'j), j = 1, — 1, 
is the minimal element of Ci(N) containing C(v'j +1 ); 

— C(v i ),..., C(vi), C(v[), ... , C{v[,) appear with multiplicity 1 in Cj(N); 

— the minimal element of Ci(N) strictly containing C(v\) is the same as the 
minimal element of Cj(N) strictly containing C(v[), and it is C(u). 
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On the other hand, (l)-(5) imply that: 

- The minimal elements of Ci(N) strictly containing {i} are {i k ,i} and {i' k i,i}', 

- {ik,i} £ . . . C {«!, . . .,i k ,i}, and each {i,-, . . . , i k , i}, j = 1, . . . , k - 1, is the 
minimal element of Ci(N) strictly containing {ij+i, ■ ■ ■ , i k , i}; 

- {i' k ,i} £ • • • £ {*'i> • • • >*fc')*h anci eacn • • • >*fc')0) j = 1, ■ ■ ■ , A;' — 1, is the 
minimal element of Cr(A) strictly containing • • • , i}', 

- {i k ,i}, ... , {n, . . . ,i k , i}, {i' k ,i}, ■ ■ ■ , ■ ■ ■ appear with multiplicity 1 in 
Ci(N); 

- the minimal element of Cj(iV) strictly containing {ii, . . . ,i k ,i} is the same as 
the minimal element containing . . . , i' k ,,i}, and it is {ii, . . . , ifc,^, . . . , i' k >,i}- 

The only possibility of making these two lists of properties compatible is that (up 
to the interchange of k and k') k = I, k' = I', C{vj) = {ij, . . . ,i k ,i} for every 
j = 1, . . . , k, and C(v'j) = {ij,..., i' k ,i} for every j = 1, . . . , k' . 

It remains to prove that the only child of every Vj (respectively v'j) not belonging 
to the reticulation cycle for h is the corresponding leaf ij (respectively i'j). This 
fact can be proved using the same argument as in the last paragraph of the proof 
of the previous lemma. 

Lemmas 11 to 13 prove that the fact that a given R or G reduction can be 
applied to N only depends on C(N), from where point (A) in Theorem 2 follows. 
As far as point (R) goes, it is a consequence of the following straightforward lemma 
that shows that the application of a specific R or T reduction to N affects C(N) 
in a way that does not depend on itself, but only on its cluster representation; 
we leave its easy proof to the reader. 

Lemma 14. (a) If the Rij reduction can be applied to N, then C(Ri-j(N)) is 
obtained by removing from C(N) the clusters {i} and {j}, and them removing 
from all remaining clusters the label j . 

(b) If the Gjjjj^..^^ reduction can be applied to N , then C(Gj ; j lj ... ) j fc; 0(iV)) is ob- 
tained by first removing from C(N) all clusters listed in points (l)-(2) of 
Lemma 12, one copy of the cluster given in point (3) therein, and the clus- 
ters {12}, ■ ■ ■ , {ik}, md then removing the labels 12, ■ ■ ■ ,i k from all remaining 
clusters. 

(c) If the G ! i ; i 1) ...,j fc; j' 1) ... ) j' / (with k' 7^ 0) reduction can be applied to N, then 
C(Gi-i j .j/ j/ (A)) is obtained by first removing from C(N) all clusters 
listed inpoints (l)-(3) of Lemma 13 and the clusters {12}, ■ ■ ■ , {ik}-, ■ • • , {^' k '}> 
and then removing the labels 12, ■ ■ ■ , i k , ■ ■ ■ , i' k , from all remaining clusters. 

□ 

A3 Proof of Theorem 3 

As in the previous subsection, we split the proof of Theorem 3 into several lemmas 
to increase its readability. In the rest of this subsection, A stands for an arbitrary 
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1-nested network on some given set S. Since the set S is fixed, for every node v 
of N, if A(v) = {ii, . . . ,ik} and B(v) = {ji, . . . ,ji}, we shall use the following 
notation to denote the tripartition 6(v): 

9{v) = {h, ...,i fc \h,... ,ji}. 

To simplify the notations, we shall denote by Oi(N) the multiset of tripartitions 
of its internal nodes, which is obtained by removing from 6(N) one copy of every 
tripartition {i | 0} with i G S. 

Lemma 15. Two leaves i, j are tree leaves and siblings if, and only if, the following 
conditions are satisfied: 

(a) There exists an internal node v such that i,j G A(v) and C(v) is contained in 
the cluster of any internal ancestor of i or j. 

(b) For every node w of N such that i,j £ C(w), it happens that either i,j £ A(w) 
or i,j £ B(w). 

Moreover, when i and j are sibling tree leaves, they are the only children of their 
parent if, and only if, the node v in point (a) is such that C(v) = {i, j}. 

Proof. If i,j are two sibling tree leaves and v is their common parent, then they 
are strict descendants of v and C(v) is contained in the cluster of any ancestor of 
i or j. Let now w be any node such that i,j G C(w). Then, w is ancestor of v. 
If v is a strict descendant of w, then i,j are also strict descendants of w, and if 
v is a non-strict descendant of w, then i,j are also non-strict descendants of w. 
Therefore, either i,j G A(w) or i, j G B{w). This finishes the proof of the 'only if 
implication. 

As far as the converse implication goes, the existence of the internal node v 
with i,j G A(v) and such that C(v) is contained in the cluster of every ancestor of i 
or j implies that there does not exist any internal node whose cluster contains one 
of the labels i,j but not the other, and therefore, by Lemma 9.(b), that i and j are 
siblings. Let i>o be a common parent of them: then, on the one hand, i,j G C(vq) 
implies that C(v) C C(vq), and, on the other hand, since i,j G A(v), vq must be 
a descendant of v, and therefore C(vq) C C(v). We conclude that C{vq) = C(v). 

Let us prove now that i,j G A(vq). Indeed, if one of them were a non-strict 
descendant of vo, then by (b) both would be non-strict descendants of it. By Lemma 
9.(d), and taking into account that vq is a parent of i and j, this would imply that 
i and j are hybrid leaves and Vq intermediate in their reticulation cycles, which 
would contradict the 1-nested condition. 

This implies that there would exist paths from the root of iV to i and j that do 
not contain vq. This could only happen if both i and j were hybrid leaves and vq 
intermediate in their reticulation cycles (if it were the split node of one of them, 
the corresponding hybrid leaf would be a strict descendant of it by Lemma 10. (e)), 
Let us prove now that i and j are tree leaves. Indeed, if, say, i is a hybrid leaf 
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and v' its other parent, then, since i £ A(vq), v' q is a descendant of vq and then 
intermediate in the reticulation cycle for i (which would have i>o as split node). 
Now, since i € C(v' ), it must happen that j £ C(v' ) and, since v' cannot be an 
ancestor of vo, we conclude that j is also hybrid and that v' is an ancestor of its 
other parent. But then, v' is also intermediate in the reticulation cycle for j (which 
consists of the arc (t>o, j) and the merge path vo ~^ v' j), which is impossible. 
This shows that i and, by symmetry, j are tree leaves. 

This finishes the proof that i and j are tree sibling leaves if, and only if, (a) 
and (b) are satisfied; moreover, from this proof we deduce that we can take as 
v in (a) the common parent of i and j. Now, as far as the last assertion in the 
statement, if i and j are the only children of their common parent v, it is clear that 
C(v) = Conversely, if v has a child u different from i and j, then u cannot 

be an ancestor of i and j, and therefore any descendant leaf of it is an element of 
C(v) different from i and j, which shows that {i, j} C C(v). 

As a direct consequence of this lemma we obtain the following two results. 

Lemma 16. The Rij reduction can be applied to N if, and only if, the following 
conditions are satisfied: 

(1) There exists an internal node v such that 9{v) = {i,j \ 0} and C(v) is contained 
in the cluster of any internal ancestor of i or j. 

(2) For every node w of N such that i,j € C(w), it happens either that i,j £ A(w) 
or i,j £ B(w). □ 

Lemma 17. The T-j reduction can be applied to N if, and only if, the following 
conditions are satisfied: 

(1) There exists an internal node v such that i,j £ A(v), {i,j} £ C(v), and C(v) 
is contained in the cluster of any other internal ancestor of i or j. 

(2) For every node w of N such that i,j £ C(w), it happens either that i,j £ A(w) 
or i,j £ B(w). □ 

Let us consider now the G and G reductions. In contrast to the corresponding 
lemmas in §A2, here we do not need to distinguish between k' = and k' > 0. 

Lemma 18. The G^ ^.^ ^ reduction (with k ^ k! ^ 0) can be applied to N 
if, and only if, the following conditions are satisfied: 

(1) {i | 0} € 

(2) For every j = 1, . . . , k, {ij, . . . , ik | i} £ 0(N) with multiplicity 1. 

(3) For every j = 1, . . . , k' , {i'j, . . . , i' k , \ i} £ 6(N) with multiplicity 1. 

(4) For every 9(v) £ 8i(N), ifC(v) contains some label among ii, . . . , ik, i'i, ■ ■ ■ , i' k > 
and 9{v) is not listed in (2) or (3), then either i, i±, . . . , i^, i^, . . . , i' k , £ A{v) or 

. . -,ik,i[, ■■■,i' k > G B(v). 
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Proof. If N contains a reticulation cycle K consisting of the merge paths (u, v\ , . . . , , 
and (u, v[, . . . , v k ,, h) such that the only child of the hybrid node h is the tree leaf 
i and each tree node Vj (respectively v'j) has only one child outside K and it is the 
tree leaf ij (respectively ij), then 

9(h) = {i | 0} 

0(vj) = {ij, ...,i k \i}, j = 1,... , k 
9(v' j ) = {i' j ,...,i' k ,\i}, j = l,...,k> 

and hence 9i(N) contains all tripartitions listed in points (l)-(3). Let now v be any 
internal node different from v\ , . . . , v k , v[ , . . . , v' k , that is an ancestor of some leaf 
ii, . . . , ifc, i'i, . . . , i' k ,, say that ij £ C(v). Then it will be an ancestor of its parent 
Vj and in particular i £ C(vj) C C(v). Then, by Lemma 10. (f), u is a descendant 
of v. Now, if u is a strict descendant of v, then the leaves i±, . . . , i k , i'-y, . . . , i' k ,, i are 
also strict descendants of v, while if u is a non-strict descendant of v, then they are 
also non-strict descendants of v. This proves (4) and that all tripartitions listed in 
(2) and (3) appear only once in 9(N) (Lemma 10. (g) did not guarantee it for 9{v\) 
when k' = 0). This finishes the proof of the 'only if implication. 

Conversely, assume that (1)— (4) are satisfied. Using only the information of 
the clusters and arguing as in the proof of the 'if implication in Lemmas 12 and 
13, we already deduce that the parent h of i is hybrid, it has out-degree 1 and 
i is a tree child of it (using Lemma 9. (a)), that the reticulation cycle K for h 
consists of two tree merge paths (u, v\, . . . , vi, h) and (u,v[, . . . ,v',,,h), with / ^ k 
and I' ^ k', and that 9{vi-j) = {ik-j, ■ ■ ■ ,ik \ i}i f° r every j = 0, . . . , k — 1, and 
9( v l'-j) = {i'k'-j' ■■■i i k l I for ever y j = 0, ... ,fe' - 1. Now, if / > k, C(v^ k ) 
would strictly contain C(vi_n._W) and i\ would be a strict descendant of v\-k 
(because it is a strict descendant of its tree child but i would be a non- 

strict descendant of it, which would contradict (4). This implies then that k = I 
and C(vj) = {ij, . . . , ik,i} for every i = 1, . . . , k, and then, by symmetry, k' = I' 
and C(v'j) = {ij, . . . , i' k ,, i} for every i = 1, . . . , k'. 

It remains to prove that the only child of every Vj (respectively, v'j) not belong- 
ing to the reticulation cycle for h is the corresponding tree leaf ij (respectively, ij). 
Let us prove first that each ij is a tree leaf. Indeed, if ij were hybrid, then, since 
ij £ A(vj), Vj would be an ancestor of the split node Wj of the reticulation cycle 
for ij. Since ij only belongs to the clusters of the nodes in K that are ancestors of 
Vj, we conclude that Wj does not belong to K and then, since ij £ A(vjj), by (4) 
we have that i £ A(wj) and hence, by Lemma 10. (f), that Wj is a strict ancestor 
of u, which is impossible. 

Let now Uj be a child of Vj different from ij and from Vj's child in K. This node 
must be internal, because the other descendant leaves ij+i, . . . ,ik,i of Vj are tree 
leaves and descendants of proper descendants of Vj in K, and therefore Vj is not 
their parent. Then, since C(uj) C C(vj) = {ij, ■ ■ ■ ,ik,i}, by (4) we conclude that 
i £ C(uj) and hence, since Uj does not belong to K, by Lemma 10. (f) we conclude 
that Uj is an ancestor of u, which is impossible. This shows that Vj does not have 
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any child outside K different from ij, and moreover, since ij is a descendant of Vj, 
that it is its child. 

Lemma 19. The Gj.j lj ... j j fc .j^ j ... ) j/ / reduction (with k ^ k' ^ 0) can be applied to N 
if, and only if, the following conditions are satisfied: 

(1) {i}td{N). 

(2) For every j = 1, . . . , k, {ij, . . . , ik \ i} G 0{N) with multiplicity 1. 

(3) For every j = 1, . . . , k' , {i'j, . . . , i' k , \ i} G 0(N) with multiplicity 1. 

(4) For every 9{v) G 9i(N), ifC(v) contains some label among i±, . . . ,i k ,i[, ■ ■ ■ ,i' k / 
and 9{v) is not listed in (2) or (3), then either i, i±, . . . , ik, i! x , . . . , i' k , G A{v) or 
i,ii,...,ik,i' 1 ,...,i' k , G B{v). 

Proof. If N contains a reticulation cycle K consisting of the merge paths (u, v\ , . . . , v k 
and (u,v[, . . . ,v' k ,, h) such that the hybrid node h is the leaf i and each tree node Vj 
(respectively, v'j) has only one child outside K and it is the tree leaf ij (respectively 
ij), then, by Lemma 9. (a), {i} ^ Ci(N), and 

0{vj) = {ij, ...,i k \i}, j = 1, . . . , k 
e(v' j ) = {i' j ,...,i' k ,\i}, j = l,...,k' 

and hence N satisfies (1)— (3). The rest of the 'only if implication can be proved 
as in Lemma 18. 

Conversely, assume that (l)-(4) are satisfied. To begin with, let us prove that 
i is a hybrid leaf. Indeed, if it were a tree leaf, then its parent v would be a strict 
ancestor of i, and therefore 9{v) would be none of the tripartitions listed in (2) or 
(3). On the other hand, v would be a descendant of the node w having tripartition 
{i k | i}, which would imply, since {i} ^ C(v) by (1), that i k G C(v). Then, by (4) 
and since i € A(v), i k would also be a strict descendant of v . This would imply 
that w is a strict ancestor of v: any path r ~» v not containing w followed by a 
path v ~^ ik (that does not contain w because w is an ancestor of v) would form 
a path r~^>i k not containing w, against the assumption that i k G A{w). But then 
the tree child i of v would be also a strict descendant of w, which would contradict 
the assumption that i G B(w). 

Let us also denote by h this hybrid leaf labeled with i, so that C(h) = {i}. 
Since we can still apply Lemma 10, the same argument as in the proof of the 'if 
implication in Lemma 18 implies that the reticulation cycle K for h consists of two 
merge paths (u, v\, . . . , v k , h) and (u, v[, . . . , v' k ,, h) such that C(vj) = {ij, . . . , ik, i}, 
for every j = 1, . . . , k, and C(v'j) = {i'j, . . . , i' k ,,i} for every j = 1, . . . , k'. 

The proof that the only child of every Vj (respectively, v'j) not belonging to K 
is the corresponding tree leaf ij (respectively, i'j) is also similar to the one given 
for the corresponding fact in Lemma 18, and we do not repeat it here. 

Lemmas 16 to 19 prove condition (A) in Lemma 6 for the R, T, G, and G 
reductions and the tripartitions representation. As far as point (R) goes, it is a 
consequence of the following lemma. 
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Lemma 20. (a) If the Rij reduction can be applied to N, then 0(Ri-j(N)) is 
obtained by removing the tripartitions {i | 0} and {j | 0} from 6(N), and then 
removing the label j from all remaining tripartitions in 9(N). 

(b) If the Ti-j reduction can be applied to N , then 6(Ti-j(N)) is obtained by remov- 
ing the label j from all tripartitions in 9{N). 

(c) If the 5 reduction can be applied to N , then 9{G i . il ^ ...^^ (-^0) 
is obtained by first removing from 9(N) all tripartitions listed in points (l)-(3) 
of Lemma 18 and the tripartitions {12 | $},... ,{ik | 0}, {i'i | 0}, • • • , {i' k i | 0}, 
and then removing the labels 12, ■ ■ ■ , iki^'n ■ ■ ■ > i'k' from all remaining triparti- 
tions. 

(d) If the Gijij,...^;^,.../ reduction can be applied to N , then 9{G i . il ^ ...,i fc; j' 1: (N) 
is obtained by first removing from 9{N) all tripartitions listed in point (2) of 
Lemma 19 and the tripartitions {12 | 0}, . . . , {ik \ 0}, {i[ \ 0}, • • • , {i' k i \ 0}, and 
then removing the labels 12, ■ ■ ■ ,ik,i[, ■ ■ ■ ,i k r from all remaining tripartitions. 



A4 Proof of Theorem 4 

We also split the proof of Theorem 4 into several lemmas, in parallel to the preced- 
ing subsections. In the rest of this subsection, N stands for a semibinary 1-nested 
network onS = {l,...,n}. Notice that all leaves in N are of tree type. 

For every pair of nodes u, v in N, we shall denote by CA(u, v) the set of 
common ancestors of u and v. By Lemma 8, [u, v] is the element of CA(u, v) that 
is a descendant of all other nodes in this set. 

The following result summarizes what Lem. 5 and Cor. 4 in [11] say about N. 
Although these results were stated therein for tree-child time consistent evolution- 
ary networks with out-degree 1 hybrid nodes, it is straightforward to check that 
the time consistency is not used anywhere in their proofs, and therefore their thesis 
also holds for tree-child (and, in particular, for 1-nested) semibinary hybridization 
networks. In the following statement, and henceforth, by saying that a leaf j is a 
quasi-sibling of a leaf i, we mean that the parent of j is a hybrid node that is a 
sibling of i: cf. Fig. 11. 



□ 




Fig. 11. j is a quasi-sibling of i. 



Lemma 21. Let i,j be any labels in S. 



(a) £^[(i,j) = 1 if, and only if, the parent of i is an ancestor of j. 
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(b) The leaves i,j are siblings if, and only if, £^(i,j) = ^vCM) = 1. 

(c) The leaf j is a quasi-sibling of the leafi if, and only if, ^jv(i, j) = 1, £nU, i) = 2, 
and £nU, k) > 1 for every k G S\ {i,j}. □ 

As a consequence of this lemma, we have the following results. 

Lemma 22. The Ri-j reduction can be applied to N if, and only if, £N(i,j) = 
^N{j,i) = 1 and £]\r(i,k) > 1 for every k G S\ {i,j}. 

Proof. The Ri-j reduction can be applied to N if, and only if, i,j are sibling leaves 
and their parent has out-degree 2. By the previous lemma we already know that 
^N(i,j) = ^N(j,i) = 1 if, and only if, i,j are sibling leaves. Thus, it only remains 
to prove that the parent of i and j has out-degree 2 if, and only if, £n(i, k) > 1 
for every k £ S \ {i, j}. Now, if there is a leaf k ^ j such that £N(i, k) = 1, then 
the parent of i and j is also an ancestor of k, which means that it has out-degree 
at least 3. Conversely, if the parent of i and j has out-degree at least 3 and v is a 
child of it other than then £^{i,k) = 1 for every descendant leaf k of v. 

A similar argument, using that the T^j reduction can be applied to N if, and 
only if, i and j are tree sibling leaves and their parent has some other child, proves 
the following result. 

Lemma 23. The T^j reduction can be applied to N if, and only if, £jy(i,j) = 
^vCM) = 1 and there exists some k £ S \ {i,j} such that £^{i,k) = 1. □ 

We have now the following lemmas for the G reductions. 

Lemma 24. The j fc; reduction can be applied to N if, and only if, the 

following conditions are satisfied: 

(1) £ N (i, I) > 1 for every I €. S \ {i}. 

(2) £N(ik,i) = 1 and £N(i,ik) = 2. 

(3) For every j = 1, . . . , k — 1, £N(ij, ij+i) = 1 and £N(ij+i, ij) = 2. 

(4) For every j = 1, . . . , k and for every I ^ ij, i, £N(ij, > 1- 

(5) For every j = 1, . . . , k, £n(i, ij) = k — j + 2. 

( 6) For every I {i, i±, i^}, £jv(^, I) = £N(hJ) and £n(1, i) = &n(1, h)- 

Proof. Assume that ./V contains a reticulation cycle K consisting of the merge 
paths (u, vi, . . . , Vk, h) and (u, h) such that the only child of the hybrid node h is 
the leaf i and each tree node Vj has only one child outside K, and it is the tree leaf 
ij. Then, (1) and (2) are satisfied because i is a quasi-sibling of i^, (3) is satisfied 
because the parent Vj + \ of each ij + \ is a sibling of ij, (4) is satisfied because the 
only descendant leaves of the parent v j of ij are ij,ij + i, . . . , i, and (5) is satisfied 
because [ij, i] = Vj and the only path Vj~^>i has length k — j + 2. As far as condition 
(6) goes, let I be any label different from i,i\, . . . ,ik- Then, I is not a descendant 
of v\ and therefore every common ancestor of i or i\ and / must be an ancestor of 
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u. This implies that CA(i,l) = CA(u,l) = CA(i\,l), from where we deduce that 
= [u,l] = [h,l]- This clearly implies that 1^(1, i) = ^jv(Mi)- On the other 
hand, any shortest path [u, will consist of a shortest path [u, l\~^>u followed 
by the path (u, h, i), and any shortest path [u, I] will consist of a shortest path 
[u, l}^>u followed by the path (u, v±,ii), which implies that In^, = £n(h, 0- 

Conversely, assume that N satisfies conditions (1) to (6). Then, conditions (1) 
and (2) imply that i is a quasi-sib ling of i^: let h be the hybrid parent of i and let 
Vk be the parent of and h, which will be a tree node because it has out-degree 
at least 2. Now, condition (3) implies that, for every j = 1, . . . , k — 1, the parent 
of ij is also parent of the parent of ij+i- if we let Vj be the parent of ij, for every 
j = 1, . . . , k — 1, we obtain a path (vi,...,Vk) consisting of tree nodes (because 
each node in it has out-degree at least 2) and such that each Vj is the parent of 
the leaf ij. 

Now, Vk may be either intermediate in the reticulation cycle K for h or the split 
node of K (in which case one of the merge paths would be the arc (t^, h)). But, if 
the latter happened, h would have another parent v and it would be a descendant 
of Vk, and then, any tree descendant leaf / of v would be such that ^7v(ifc, I) = 1, 
which would contradict (4). This implies that Vk is intermediate in K. 

Let now v be the other parent of h, and assume that it is intermediate in 
the merge path of K not containing v^. Let I be a tree descendant leaf of v. 
By Lemma 10. (d), / ^ {ii, . . . ,ik}. Then, by (6), £n(h,1) = ^jv(M) = 2 and 
£n(1, i) = @n(1, h)- But the latter condition implies that [I, i\] = [I, i] = v, and then 
the former implies that v is the parent of v±, which would imply that i\ G C(v), 
leading to a contradiction again by Lemma 10. (d). We conclude that the merge 
path not containing is a single arc. In particular, this implies that no node 
v±, . . . , Vk-i is the split node of K: if Vj were the split node of K, then ^jv(i, ij) = 2, 
against (5). So, the split node u of K is a proper ancestor of v\. Let us see that 
u is the parent of v\. Indeed, if u were not the parent w of vi, then w would be 
intermediate in the merge path u~^>v\^>h: let w' be a child of w outside K, and 
let I be a tree descendant leaf of w'. Then, since I £ {i, ii, . . . , i^}, (6) would imply 
that In(i, I) = ^N(h, I) = 2, while it is clear that ^v(i, I) = k + 2 (because [I, i] = w 
and the only path w~^>i, along the merge path, has length k + 2). 

In summary, we have proved so far that if N satisfies conditions (1) to (6), 
then it contains a reticulation cycle for the hybrid parent h of i consisting of the 
merge paths (n, v\, . . . , v^, h) and (u, h), and that each Vj is the parent of the tree 
leaf ij. It remains to prove that v\, ...,Vk have out-degree 2. But, if some Vj had 
some child Wj other than ij or its child in K, and if I were a tree descendant leaf 
of Wj, then I ^ {i,ij , ■ ■ ■ , ik} but iN^ij, = 1, against (4). 

Lemma 25. The G^ ^.^ ^^ reduction (with k ^ k' > 0) can be applied to N 
if, and only if, the following conditions are satisfied: 

(1) £ N (i, I) > 1 for every I G S\ {i}. 

(2) lN(ik,i) = 1 and £N(i,ik) = 2. 
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(2') l N {i' v ,%) = 1 and £ N {i,i' k ,) = 2. 

(3) For every j = 1, . . . , k — 1, ^v(*?> ij+i) = 1 o,nd £jv(^+i, ij) = 2. 
(8') For every j = 1, . . . , k' - I, £ N {i'j,i' j+1 ) = 1 and £ N (i' j+1 , i'j) = 2. 

(4) For every j = 1, . . . , k and for every I ^ ij, ik, i, @N(ij, > 1- 
(^9 For every j = 1, . . . ,k' and for every I ^ i'-, i' fc , , z ; £n(i'j,1) > 1. 
f5j ^(ii,»i) = M*'i.*i) = 2. 

Proof Assume that ./V contains a reticulation cycle consisting of the merge 
paths (u, vi, . . . , Vk, h) and (u,v[, . . . ,v' k ,, h), with k ^ k! > 0, such that the only 
child of the hybrid node h is the leaf i, each tree node Vj has only one child outside 
K, and it is the tree leaf ij, and each tree node v'j has only one child outside K, 
and it is the tree leaf i'j. The proof that it satisfies the conditions (1) to (4) and 
(1') to (4') is similar to the corresponding proof in the previous lemma, and (5) 
is a direct consequence of the fact that [i\,i[] = u (because v\ and v[ are not 
connected by a path by Lemma 10. (a)) . 

Conversely, assume that N satisfies all conditions listed in the statement. Con- 
ditions (1), (2) and (2') imply that i is a quasi-sibling of ik and i' k ,: let h be the 
hybrid parent of i, and let Vk and v' k , be, respectively, the parents of ik and i' k ,. 
As in the previous lemma, conditions (3) and (3') imply the existence of paths 
(v\, . . . , Vk) and (v[, . . . , v' k ,) consisting of tree nodes and such that each Vj is the 
parent of the leaf ij and each v'j is the parent of the leaf i'j . 

Now, no node v\, . . . , Vk, v[, . . . , is the split node of K: if, say, Vj were the 
split node of K, then in particular v' k ,, and hence i' k ,, would be a descendant of 
Vj, which would imply that Vj = [ij,i' k i] and thus £N(ij,i' k ,) = 1, against (4). 
Therefore, the split node of K is a common ancestor of v\ and v[. Now, (6) implies 
that [i\, is simultaneously the parent of v\ and v[, and therefore that this parent 
is the split node u of K. 

Finally, the proof that the intermediate nodes of K have out-degree 2 is similar 
to the proof of the corresponding fact in the previous lemma, using (4) and (4'). 

Lemmas 22 to 25 imply that the possibility of applying a specific R, T or G 
reduction to N depends only on £(N), from where condition (A) in Theorem 4 
follows. As far as condition (R) goes, we have the following lemma. 

Lemma 26. (a) If the Ri j reduction can be applied to N, then, for every k,l G 
S\{j}, 

- ^Ri-j(N)(i,k) = £N{i,k) - 1 ifk^i. 

- ^Ri.j(N)(k,i) =£N(k,i) ifk^i. 
~ ^Ri. d (N)(k,l) = £N(k,l) ifk,l^i. 

(b) If the Ti-j reduction can be applied to N , then, for every k,l G S \ {j}, 

£ T .. {N) (k,l)=£ N (k,l). 

(c) If the G i . il <ik i i i ^ i i / reduction (with k ^ k' ^ 0) can be applied to N, then, 
for every j,leS\ {i 2 , . . . ,i k ,i[, ■ ■ ■ , i' k >}, 
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Proof, (a) Ri-j(N) is obtained by removing the leaf j and replacing the leaf i by its 
parent. This implies that, for every pair of remaining leaves, their LCA is the same 
node in N and in Ri-j(N), and that any path ending in i is shortened in one arc, 
while all paths ending in any other remaining leaf are left untouched. The formulas 
for £n i .j(N) given in the statement follow immediately from these observations. 

(b) Ti-j(N) is obtained by removing the leaf j without modifying anything else. 
This implies that, for every pair of remaining leaves, their LCA is the same node in 
N and in Ti-j(N) and no path ending in a remaining leaf is modified, and therefore 
that ^T i;j (Af) = t-N on S \ {j}. 

(c) Let us denote by N' the network (rj ; j lv ..,i fc / 1 ,...,j' fc/ (-^V) ) and let u be the split 
node of the removed reticulation cycle. We remove all (and only) descendants of 
u, and we add to u two new tree leaf children i and i±. This implies that the LCA 
in N' of i and i\ is u (and therefore (-N'ihii) = ^iV'(*i>*) = 1) an d that the LCA 
of any other pair of remaining leaves is the same node in TV' as in N. On the other 
hand, any path ending in i\ is shortened in one arc, the distance from any internal 
node to i in N' is the same as its distance to ii, and all paths ending in remaining 
leaves other than i or i\ are not touched. From these observations, the formulas 
for £^> given in the statement easily follow. 
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